Extract string parsing to a separate package #14772

nicolo-ribaudo · 2022-07-19T23:50:11Z

Q	A
Fixed Issues?
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass?	Yes
Documentation PR Link
Any Dependency Changes?
License	MIT

This was hard, but it lets us avoid an external dependency with some logic that we already have in #14757.

@liuxingbaoyu you should be able to do something like this:

let unterminatedCalled = false;
try {
  const error = () => { throw new Error(); };
  cooked = parseStringContents("template", raw, 0, 0, 0, {
    unterminated() { unterminatedCalled = true; },
    strictNumericEscape: error,
    invalidEscapeSequence: error,
    numericSeparatorInEscapeSequence: error,
    unexpectedNumericSeparator: error,
    invalidDigit: error,
    invalidCodePoint: error,
  });
  if (!unterminatedCalled) cooked = null; // we must consume the whole raw string
} catch {}

This PR mostly moves code around, with one main change: instead of directly updating this.state.{pos,curLine,lineStart}, we keep track of them in variables and we update them only after returning from the helper parser functions.

liuxingbaoyu · 2022-07-20T03:12:55Z

Amazing!

babel/babel.config.js

Lines 208 to 214 in 486d27a

    
           { 
        
             test: [ 
        
               "packages/babel-generator", 
        
               "packages/babel-plugin-proposal-decorators", 
        
             ].map(normalize), 
        
             plugins: ["babel-plugin-transform-charcodes"], 
        
           },

Can you add it here?

babel-bot · 2022-07-20T11:19:34Z

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/52568/

JLHwung · 2022-07-20T12:35:43Z

packages/babel-parser/test/fixtures/es2015/uncategorised/217/output.json

@@ -20,7 +20,7 @@
          "extra": {
            "rawValue": "\\u{}",
            "raw": "\"\\u{}\"",
-            "expressionValue": "null"
+            "expressionValue": ""


The new behaviour here is better than current main. Should we set expressionValue to be null when the string can not be successfully parsed?

I admit I don't know why the behavior changed 😬
But yes, null would be even better.

Actually, expressionValue is a property of directives that represents the .value property of the originally parsed StringLiteral. I feat that making it null would break anything relying on the fact that StringLiteral.value is a string, so I prefer the behavior currently implemented by this PR of just replacing invalid escapes with "" (for example, ab\u{}cd is abcd).

JLHwung

Awesome!

liuxingbaoyu

I did a lax benchmark with no significant performance penalty.

current 1 jquery 3.6: 40.11 ops/sec ±5.28% (25ms)
current 4 jquery 3.6: 9.5 ops/sec ±3.78% (105ms)
current 16 jquery 3.6: 2.22 ops/sec ±7.14% (451ms)
current 64 jquery 3.6: 0.54 ops/sec ±2.83% (1861ms)
baseline 1 jquery 3.6: 41.42 ops/sec ±5.53% (24ms)
baseline 4 jquery 3.6: 9.89 ops/sec ±2.5% (101ms)
baseline 16 jquery 3.6: 2.36 ops/sec ±5% (424ms)
baseline 64 jquery 3.6: 0.55 ops/sec ±9.05% (1830ms)

nicolo-ribaudo · 2022-07-20T15:42:08Z

Good! I believe rewriting all the tokenizer like this could actually improve performance, because we would skip all the this.state.* accesses and the this.... prototype lookup to call other tokenizer methods.

merceyz · 2022-07-21T11:48:20Z

packages/babel-helper-string-parser/package.json

+  "devDependencies": {
+    "charcodes": "^0.2.0"
+  },


I don't know if @babel/helper-string-parser will be bundled so asking to be on the safe side; shouldn't this be a normal dependency?

We have a plugin that inlines charcodes.*** into the corresponding codepoints: https://github.com/xtuc/charcodes/blob/master/packages/babel-plugin-transform-charcodes/README.md

nicolo-ribaudo added 3 commits July 20, 2022 01:32

Extract string parsing to an isolated file

b4b9bf1

Move to a new package

a98c722

Avoid infinite loop if errors.unterminated does not throw

f879f44

nicolo-ribaudo added PR: Internal 🏠 A type of pull request used for our changelog categories pkg: parser labels Jul 19, 2022

JLHwung self-requested a review July 20, 2022 00:17

nicolo-ribaudo added 2 commits July 20, 2022 13:08

Transform charcodes

ceee147

Fix regression

9971813

nicolo-ribaudo force-pushed the parser-extract-string-parsing branch from 43cc94a to 9971813 Compare July 20, 2022 11:15

JLHwung reviewed Jul 20, 2022

View reviewed changes

Flow

23418cd

nicolo-ribaudo force-pushed the parser-extract-string-parsing branch from 700d6f0 to 23418cd Compare July 20, 2022 13:19

JLHwung approved these changes Jul 20, 2022

View reviewed changes

liuxingbaoyu approved these changes Jul 20, 2022

View reviewed changes

nicolo-ribaudo merged commit ea5ff29 into babel:main Jul 20, 2022

nicolo-ribaudo deleted the parser-extract-string-parsing branch July 20, 2022 15:42

merceyz reviewed Jul 21, 2022

View reviewed changes

nicolo-ribaudo mentioned this pull request Jul 26, 2022

Fix position of errors in template literals after newlines #14798

Merged

github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Oct 21, 2022

github-actions bot locked as resolved and limited conversation to collaborators Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract string parsing to a separate package #14772

Extract string parsing to a separate package #14772

nicolo-ribaudo commented Jul 19, 2022 •

edited by gitpod-io bot

liuxingbaoyu commented Jul 20, 2022

babel-bot commented Jul 20, 2022 •

edited

JLHwung Jul 20, 2022

nicolo-ribaudo Jul 20, 2022

nicolo-ribaudo Jul 20, 2022 •

edited

JLHwung left a comment

liuxingbaoyu left a comment

nicolo-ribaudo commented Jul 20, 2022

merceyz Jul 21, 2022 •

edited

nicolo-ribaudo Jul 21, 2022

Extract string parsing to a separate package #14772

Extract string parsing to a separate package #14772

Conversation

nicolo-ribaudo commented Jul 19, 2022 • edited by gitpod-io bot

liuxingbaoyu commented Jul 20, 2022

babel-bot commented Jul 20, 2022 • edited

JLHwung Jul 20, 2022

Choose a reason for hiding this comment

nicolo-ribaudo Jul 20, 2022

Choose a reason for hiding this comment

nicolo-ribaudo Jul 20, 2022 • edited

Choose a reason for hiding this comment

JLHwung left a comment

Choose a reason for hiding this comment

liuxingbaoyu left a comment

Choose a reason for hiding this comment

nicolo-ribaudo commented Jul 20, 2022

merceyz Jul 21, 2022 • edited

Choose a reason for hiding this comment

nicolo-ribaudo Jul 21, 2022

Choose a reason for hiding this comment

nicolo-ribaudo commented Jul 19, 2022 •

edited by gitpod-io bot

babel-bot commented Jul 20, 2022 •

edited

nicolo-ribaudo Jul 20, 2022 •

edited

merceyz Jul 21, 2022 •

edited